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We consider the design of time-invariant recursive filters of constrained 
order for one-step prediction of discrete-time stationary processes. For 
this purpose, we introduce the projecting-filter concept. An nth-order 
projecting filter for a given process has the characterizing property that 
with the process as input, the output at each instant is the optimal linear 
combination of the n previous output and n latest input samples. This 
definition implies that (i) the filter is stable, (ii) any n + 1 consecutive 
samples of the prediction error sequence are mutually uncorrected, (in) 
the mean-square prediction error is at least as low as that of the best nth 
order nonrecursive predictor, and (iv) if the spectral density of the process 
is rational of order 2n or less, then the nth-order projecting filter coincides 
with the optimal (unconstrained) linear predictor. 

A design algorithm for nth-order projecting filters iteratively generates 
successive sets of coefficients of a time-varying nth-order recursive filter 
which asymptotically approaches the desired time-invariant filter. The 
only input data needed for the algorithm are the autocovariance coefficients 
of the process to be predicted. When the order of the filler is matched to 
the order of the process, the time-varying filter is the same as the Kalman 
predictor. The algorithm has ijielded effective projecting filters for several 
specific processes. Our results indicate that near optimal prediction may 
often be obtained with filters of order lower than that of the optimal uncon- 
strained predictor. 

I. INTRODUCTION 

Although the optimal linear predictor of a random process must 
make use of the entire past of the process, any practical predictor can 
store only a finite number of data. One way to design a finite storage 
predictor is to determine the best linear combination of the n latest 
sample values of the process. However, for many processes, a large 

2377 



2378 THE BELL SYSTEM TECHNICAL JOURNAL, NOVEMBER 1970 

value of n is required to achieve a performance quality approaching 
that of the unconstrained optimal linear predictor. An alternate 
approach is to find the best recursive predictor constrained to operate 
only on the n latest data samples and the n latest predictions. This 
approach has the advantage of using condensed information from the 
entire past of the process with the consequence that optimal or near 
optimal prediction can often be achieved with a relatively small 
amount of storage. 

The purpose of this paper is to introduce the projecting-filter ap- 
proach to recursive prediction and to present an algorithm for the 
design of projecting filters that has yielded effective low-order pre- 
dictors not otherwise attainable. So far, a complete theory of project- 
ing niters has not been established. We do not yet know how broad 
is the class of processes which possess projecting filters of a given 
order; nor have we determined the class of processes for which our 
design algorithm is effective. However, we can report very favorable 
experience in the design of projecting filters for a variety of specific 
processes. We have also established some important theoretical proper- 
ties of projecting filters. 

1.1 Optimal and Finite Memory Predictors 

In certain special cases the optimal (least mean-square error) un- 
constrained predictor is realizable with a finite-storage filter. 1 In 
particular, for an nth-order autoregressive, or wide-sense Markov, 
process the optimal unconstrained predictor is a finite-memory non- 
recursive filter operating only on the n latest data samples. More 
generally, the optimal unconstrained predictor of any stationary proc- 
ess whose spectral density is rational of order 2n may be implemented 
as an ?ith-order recursive filter. The characteristics of the optimal filter 
may be determined by applying the discrete-time form of Wiener's 
spectral factorization technique. Even more generally, consider any 
nonstationary process which can be modeled as the response of an 
nth-order linear time-varying recursive filter to an uncorrelated noise 
input. The optimal unconstrained predictor is an nth-order time- 
varying recursive filter 2 which may be determined by use of the 
Kalman filtering equations, 3 or, more efficiently, by a generalization 
of the approach taken in Section VI of this paper. 

If a random process cannot be modeled as the response of an nth- 
order recursive filter to an uncorrelated input, then the optimal un- 
constrained one-step linear predictor cannot be realized by an nth- 
order filter. Nevertheless, it is realistic to preselect the desired order, 
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n, of the predictor and to seek the best recursive filter of this order. 
In this way the structure of the predictor is conveniently specified 
for digital filter implementation while only the 2n parameter values 
need be supplied according to the process to be predicted. Un- 
fortunately, with the least-mean-square error criterion, the con- 
strained-order prediction problem is a special case of the unsolved 
problem of L 2 rational approximation on the unit circle. 4 No analytical 
solution is known and optimization search techniques are severely 
hampered by the multimodal nature of the error surface.* 

1.2 Projecting Filters 

In this paper we introduce the projecting filter principle of recursive 
prediction. Although the projecting filter is not a solution of the L 2 
rational approximation problem, it has the local optimality property 
that at each step it forms the best linear combination of the available 
data. The term "projection" alludes to the geometrical interpretation 
of random variables as vectors in Hilbert space. 6 - 7 Each prediction 
error of the projecting filter is a vector orthogonal to the n most recent 
inputs and the n previous errors. Hence the projecting filter performs a 
partial whitening of the input process. In this sense it approximates 
the action of the optimum unconstrained predictor, the error of which 
is a white-noise process — the innovations process of the input. If the 
input can be represented as the response of an nth-order filter to white 
noise, the nth-order projecting filter is the optimum unconstrained pre- 
dictor. For any process, the mean-square error of a projecting filter is 
never greater than the mean-square error of the optimum nonrecursive 
filter of the same order. Projecting filters are stable. 

1.3 An Example 

These properties of projecting filters are observed in the example of 
the eighth-order process {x k } represented by 

x k = 6* - 0.86*.! + 0.5e t _ 2 + 0.25e*_ 3 - 0.6 €fc _ 4 - 0.2e t _ 6 

+ 0.1e*_ 8 + 0.4e fc _ 7 - 0.08 € *- 8 

in which {e k } is a stationary white-noise process with zero mean and 
unit variance. The power spectral density function of {x k } has zeros 
at the 16 points in the z-plane indicated in Fig. 1. The eighth-order 
projecting filter for f.T/.-). which is the optimum unconstrained predic- 



*The complexity of the error as a function of the parameters is evidenced 
by the work of R. S. Phillips 5 on the corresponding continuous-time problem. 
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Fig. 1 — Locations of zeros of the spectral density function of an eighth-order 
process. The eighth-order projecting filter has poles at the zero locations that are 
outside the unit circle. 

tor, has poles at the eight locations indicated in Fig. 1 that are outside 
the unit circle. The pole positions of a seventh-order projecting filter 
are shown in Fig. 2. There are poles extremely close to all of the loca- 
tions outside the unit circle indicated in Fig. 1, except the one furthest 
from the origin. Figures 3, 4, and 5 indicate the pole locations of the 
sixth-, third-, and first-order projecting filters, respectively. The poles 
of these filters do not coincide with zeros of the power spectral density 
function of {x k }. 

Figure 6 demonstrates the projecting-filter mean-square-error per- 
formance for this process. Here the horizontal base line is the optimal 
unconstrained prediction error. The white bars indicate errors of opti- 
mal constrained order nonrecursive predictors and the shaded bars are 
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Fig. 2 — Pole locations of seventh-order projecting filter. 
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Fig, 3_Pole locations of sixth-order projecting filter. 

the errors of the projecting niters. It is significant that the error of 
the seventh-order projecting filter is extremely close to the optimum 
linear-prediction error; the ratio of the two errors is approximately 
1 + 10- r . By using the projecting filter approach to prediction, we have 
discovered a means of reducing predictor complexity with virtually no 
loss in accuracy. In addition, Fig. 6 shows the error resulting from 
low-order recursive filters and the advantages relative to nonrecursivc 
prediction. 

1.4 Organization of the Paper 

The content of the paper falls into two categories. Some sections 
contain descriptive and analytic material relevant to predictors and 




Fig. 4 — Pole locations of third-order projecting filter. 
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Fig. 5 — Pole location of first-order projecting filter. 

projecting filters in general and other sections pertain to the particular 
design method that has been used in synthesizing the predictors de- 
scribed in Section 1.3. Sections II, III and IV are in the first category; 
they define the prediction problem and the projecting-filter principle 
and focus attention on the essential properties of unconstrained pre- 
dictors and projecting filters. Section V introduces the design method, 
an iterative scheme based upon successive projections in Hilbert space. 
This technique leads to a time-varying filter that asymptotically tends 
towards the desired projecting filter. Section VI shows that when the 
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Fig. 6 — Mean-square errors of projecting filters and optimal nonrecursive 
filters of orders 1 through 8. 
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order of the filter is matched to that of the process, the design algorithm 
converges and the projecting-filter approach results in an efficient 
analysis and design (equivalent to but simpler than the Kalman filter- 
ing equations) for the unconstrained optimum time-varying filter with 
a given initial state. Section VII presents a derivation of the design 
algorithm. 

II. PROBLEM STATEMENT 

We consider a purely-nondeterministic* stationary process {x k } with 
known covariance function, r k = ExiX t . k . We assume that the 
spectral density function of the process f(z) = 2r k z k has no zeros on 
the unit circle, \z\ = 1. The purpose of this paper is to describe a new 
approach to the design of a stable one-step predicting filter with the 
nth-order recursive structure 

y k = X) a&k-i + 23 biVt-i. (1) 

t-0 «=1 

A natural measure of the performance of the predictor is the mean- 
square value of the prediction error 

e t+ , = x k+1 — y k . (2) 

Because the determination of the optimum filter coefficients with re- 
spect to this criterion is an intractable problem of approximation 
theory, our design method is based on a different performance objec- 
tive. Rather than synthesize the least-squares nth-order recursive 
filter, we seek a stable time-invariant filter with the following 

Projecting property: With input {x k }, the output, y k , is, at each in- 
stant k, the least mean-square linear combination of the data {x k , 
Kib-i, • ' • , %-n + i, Vb-i ,-•■ , yk-n) currently in the filter memory. 

This implies that the filter coefficients Ok and b { satisfy a set of linear 
equations involving the covariance functions of {x k } and {y k }. The 
autocovariance of {x k } corresponds to the given data of the prediction 
problem but the cross-covariance between {x k } and {y k } and the 
autocovariance of {y k } are transcendental functions of a* and 6,- . It 
follows that an explicit solution for the coefficients from the con- 
straints imposed by the projecting property is not possible. An algo- 
rithmic solution is presented in Section VII. 



* See Ref. 1, p. 23. 
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III. UNCONSTRALVED PREDICTION 

We refer to the problem defined in Section II as a constrained-order 
prediction problem because the order, n, of the predictor is prespecified. 
Another problem, which we refer to as unconstrained linear prediction, 
has received considerable attention in the literature of stochastic 
processes. 1 ' 8 The optimum unconstrained prediction, x k+1 , of x k +i is 
the least mean-square linear combination of the entire past, x k , x k - x , ■ ■ • 
of j x k \ . In the terminology of the Hilbert space description of random 
variables, j£ A+1 is called the projection of z k+1 into the past of \x k \, and 
we thus adopt the following convenient notation: 

**+l = * \%k+l | X k , X k -i , ■ ■ • \. (o) 

When \x k \ is gaussian, the projection coincides with the conditional 
expectation. 

3. l The Error Process 
The error process {v h \, defined by 

v k +i — #fc+i x k +i , (4) 

is the innovations process of \x k ). It has the key orthogonality prop- 
erties: 

Ev k+1 x k - { = 0, i = 0, 1, 2, ... ; (5) 

Ev k+l v>- t = 0, i = 0, 1, 2, ..• . (6) 

Equation (5), which characterizes the projection operation, indicates 
that the best linear predictor cannot make better use of the past of 
{x k \. Equation (6), a direct consequence of equation (5), shows that 
the error process is white noise. 

3.2 Stability 

The optimal unconstrained prediction, x k+x , may be characterized 
as the limit of an infinite sequence of constrained-order nonrecursive 
predictions: 



x l: 



+I = lim 2 h in x k -i (7) 



where h in (i = 0, 1, • • • , n — 1) are the coefficients of the optimum 
nth-order nonrecursive predictor which may be calculated by means 
of well-known quadratic minimization techniques. The unconstrained 
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predictor is a stable function of the data in the sense that 

lim £ hi < oo . (8) 

n-«oo i i - 

This is proved in Section IV. 

3.3 Process Representation 

We say \x k \ is of nth-order if it can be represented as the response 
of a stable recursive ?ith-order filter to white noise so that 

.-0 t" = 

in which a n or /3„ is nonzero, {e t } is a white-noise process, and Sa.z' 
has no zeros in \z\ ^ 1. If \x k \ is of order n, it is known that there exists 
an nth-order recursive filter which generates \£ k ] in response to {x k }. 
The error process of this filter is \v k ], the innovations process of \x k }. 
If 2/3,- z* ^ for |z| ^ 1, then v k = e k . 

Conversely, if {x k } does not possess an nth-order representation of 
the form of equation (9), the best unconstrained predictor cannot be 
realized by an nth-order filter. To prove this we assume that such a 
realization does exist. That is, we assume 

n-l n 

£k+i = HdiXk-i + £cA +1 _,-. (10) 

1-0 »' = 1 

This combined with equation (4) implies 

n-l n-l 

X k , x - 2 {d t + C i+1 )x k -i = V k + i + J^C i+ iV k -i (11) 

i=0 «"-0 

which shows that \x k \ is in fact the response of an nth-order filter to 
the white-noise process \v k }, which is a contradiction. 

IV. PROJECTING FILTERS 

4. l Orthogonality Properties 

We have shown that an nth-order recursive filter cannot perform op- 
timal unconstrained linear prediction of a process of order greater 
than n. With such a process as input, the error process {e k }, of an nth- 
order filter will necessarily have a higher mean-square value than that 
of the innovations process and {e*} will fail to meet the orthogonality 
conditions of equations (5) and (6). However, when the nth-order 
predictor possesses the projecting property defined in Section II, its 
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error process satisfies some but not all of the orthogonality conditions 
met by innovations process. In particular, the projecting property 
requires that 

y k = P\x k+l | x k , x k -i , •■• , z*_ n+ i , ?/*-! , ■ • • , Vk-n) (12) 

which is characterized by the orthogonality conditions 

Ee k+l x h . t = 0, i = 0,1, •■■ ,n- 1; (13) 

Ee k+1 e k .i = 0, i = 0, 1, • • • , n - 1. (14) 

Note that in this case, equation (14) is not a direct consequence of 
equation (13). In fact equation (13) is satisfied by the error of the 
optimum nth-order nonrecursive filter, while equation (14) is not sat- 
isfied by this error unless {x k } is an nth-order autoregression, that is, 
an nth-order process with fj t = for i > 0. 

4.2 Stability 

Projecting filters are inherently stable. In fact, some kind of sta- 
bility property is implicit in any statement of steady-state properties 
of a time-invariant filter. In this paper we say that a filter is stable if 
its impulse response is square summable, which implies if the spectrum 
is rational, that the filter transfer function is analytic on and in the 
unit circle. We assume that the predicting filter has zero in each mem- 
ory element prior to k = at which time {x k } is applied to the input. 
The projecting property stated in Section II implies that in the limit 
as k -» oo, y lc tends toward the projection indicated in equation (12). 
Thus in the limit, the orthogonality conditions of equations (13) and 
(14) are satisfied from which it follows that Ee kti y k -» and since 
Vk + e fctl = x k+1 , 

lim [Eyl + Eel +l ] = Ex 2 k+i = r 

k-a> 

from which we infer 

lim sup Eyl < r n . (15) 

k—oo 

We also know that the filter output for each k ^ is the finite sum 

Vk = X) o&k-i (16) 

« = 

in which gi is the filter impulse response. Equations (15) and (16) 
imply the existence of a positive number c, which bounds the mean- 
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square output: 

Eyl < c, for all fc. (17) 

The existence of this bound leads to the following 

Theorem: If a filter with impulse response g t is a projecting filter, it 
is stable in the sense that 

£*?<•■ (18) 

Proof: In terms of f{z), the power spectral density function of {x k }, 
and the frequency transfer function of the filter we have 



1 f r I k 

Eyl = 7T E gJ°' 



f(e im ) da> ^ A £ gl, (19) 



in which a = min. ,_, f{z) > according to the assumption stated 
in Section II. Equations (17) and (19) may be combined in the 
expression 

j^gi <c/X, for all k, (20) 

771 = 

from which equation (18) follows. 

The same reasoning leads to a proof of the stability of the un- 
constrained predictor. Replacing g t is h in , the impulse response of the 
nonrecursive predictor described in Section 3.3. 

V. PROJECTING-FILTER DESIGN APPROACH 

As we stated in Section II, an attempt to determine the filter co- 
efficients by directly combining equation (1) and equations (13) and 
(14) leads to an intractable set of transcendental equations relating 
the coefficients and the autocovariance function of {x k }. On the other 
hand, the iterative approach introduced in this paper leads to the 
computation of the desired coefficients by means of standard opera- 
tions of arithmetic and matrix algebra. 

Our design method results in a time-varying filter which, starting 
with zero in all memory elements, sequentially predicts Xi , x 2 , 
according to the projecting principle. At each step the filter forms 
the optimum linear combination of the available data. 

Thus we define the process {xj.} such that 

xi = 0. k < 0; (21) 

xi = x k , k^O; 



(22) 
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and we adopt as our prediction of x k+i , 

Vu = 0, fc < 0; 

Vk = P[x k+i \x' k , x' k - u ■•■ , i*_„ + i, ijk-i, • ■• , y k - n ) , k ^ 0. 

Equation (22) uniquely defines the time-varying linear transformation 
which generates the nonstationary process {y k \ from the stationary 
process [x t ] . 

At each step the prediction error of the time-varying filter meets 
the orthogonality conditions of equations (13) and (14) so that Ee k+ iij h = 
and therefore Ey\ < r for all k. Following the proof of the theorem 
in Section 4.2 we can show that with the filter output represented by 



y k = 2_ g ik x k -i, k ^ 0, (23) 

«" = 

the time-varying filter possesses the stability property 

lim sup 2 ^ t < * . (24) 

fc-oo 1=0 

Furthermore, if this filter approaches the time-invariant projecting 
filter with impulse response g { in the sense that 

lim Z is* ~ 9i) 2 - 0, (25) 

k- >co » = 

we are assured that this filter is stable and that it has the desired 
nth-order recursive structure. Hence if we determine, for each 
k, due and b ilc such that 

n-l n 

Vk = 2 o-ikX'i-i + 2 b ik y k -i (26) 

i=0 i=l 

is equivalent to equation (22), then successive computation of these 
coefficients leads to the desired time-invariant projecting filter. 

Note that although y k is uniquely determined by equation (22), 
the coefficients a ilc and b ik in the representation of equation (26) are 
not unique when the set of stored data is linearly dependent. This 
situation is analyzed in Section 7.4. 

VI. MATCHED-ORDER PROCESSES 

We prove in Section 6.1 that when {a*} is of order n, the projecting- 
filter design technique results in least mean-square time-varying pre- 
diction in the sense that each output ?//,- is the optimum linear com- 
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bination of the entire observed past of {x k }. Thus y k is equal to the 
output of the optimal nonrecursive filter of order k + 1 as described 
in Section 3.2 so that 

lim (y k — £ k+1 ) 2 = 0, 

fe-MO 

indicating that the design algorithm converges to the optimal uncon- 
strained predictor. In Section 6.2, we derive simple formulas for the 
filter coefficients generated by the design procedure. 

6. l Optimality 

We denote by 5C k the subspace spanned by the random variables 
in the filter memory at time k: x' k , x' k -i , • • • , x' k - n+x , y k -i , • • • , Vk- n ', 
and we denote by (R* the subspace spanned by the observed past of 
{x k \: x k , x k -i , ■ • ■ , x . Note that another spanning set of (R* is e k , 
e k -i , • ■ • , e v , x , where {e k } is the error sequence of the projecting 
filter. This statement follows by induction since x spans (R and if 
|e, , e,_i , • • • , Ci , x ] spans (R, then \e i+1 ,e if • • • ,e v ,x ) spans (R, +1 
because e,+i = .t, + i — Vt with ?/,• in (R,- . 

In this section we assume that \x k ) is an nth-order process represented 
by equation (9) with a = 1 so that 

X i+l = M,+i — ^a&i+i-i (27) 

in which \u k ] is the moving average process with 

u i+l = E /3,. €|+1 _, (28) 

i" = 

and \t h ) is a unit-mean-square white-noise process. Because Sa.s' ^ 
for \z\ ^ 1, equation (27) may be expressed in the form 



x k 



., = Z h t e k+1 . { , (29) 



in which { h { } is square summable. Equation (29) shows that 

Ee k+1 x k ^ =0, i^ 0, (30) 

and equations (28) and (30) imply 

Eut+iXt-i = 0, i£ n. (31) 

If we let x% +1 denote the optimal "growing-memory" prediction of 
ajjfe+i with the projection characteristic 
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x* +1 = P\x M \(R k \, 
we have the following 
Theorem: At each instant k, the time-varying filter output defined by 

Vk = P\%k + 1 | 3C/c( 

is the optimal growing-memory predictor in the sense that 

Vk = x* k+1 • (32) 

Proof: We will show that xf +1 1 JC fc which implies equation (32) because 
3C* C (Rib • Clearly, for ^ k < n, 3C k = (R k so that y k = a# +1 . We assume 
Vk — x*+i for all k < j and show that this implies y,- = xf +l . Hence, 
by induction, equation (32) is valid for all k. 

Let j ^ n and assume equation (32) holds for all k < j. Then 

Ee k+i x k ~i = 0, for k = 0, 1, • ■ • , j — 1; 

* = 0, 1, ■•• ,k. (33) 

This implies that the vectors e,- , e,_! , • • • , e x , x Q , which span (R,- are 
mutually orthogonal. Thus a projection into (R,- is the sum of the pro- 
jections into each of these basis vectors. In particular 

P{u i + l | <R,.| = P{u i+1 | x ] -j- £P{u, +1 | e,_,|. (34) 

1=0 

Now note that e,_,- e (R,-_,- and that equation (31) states that u j+1 _|_ 
(R,_, for i ^ n. Thus the first term in equation (34) and all but the first 
n terms of the summation are zero so that 

P{u l+i |(R,j = £PK +1 |e,-_,|. (35) 

i-0 

We now consider .-cf +1 by noting that the projection operator is 
linear and that P\x k -i | (R k \ = £*-,- for i = 0, 1, • • • , k. Thus equation 
(27) implies 

xf +l = P{x i+l | <R,} = P{w,- +1 | (R,| - E^M-i (36) 

i=i 

or, from equation (35) 

n-l n 

xf+i = £P{u, + i |e,-,j - S«<**+i-i- ( 37 ) 

t-0 i=l 

Note that the ith term in the first summation is proportional to e,_, 
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so that xf +l is a linear combination of z, , x,_i , • • • , 3,-»+i , Vi-\ » ' ' ' i 
?/,_„ , the basis vectors of 3C,- . Thus xf +1 s 3C f and 

z? +I = P!x J+1 |3C f -J = ?/, . 

Hence .r,* +1 = y k for all /c. Q.E.D. 

6.2 Filter Coefficients 

In this section we derive explicit recursions for the coefficients and 
mean-square error of the optimal growing-memory predictor of a 
stationary nth-order process. We begin with equation (37) for the 
optimal prediction and observe that the projections have the form 

P{u k+l | e k .i) = yafi*-i , i = 0, 1, • • • , n - 1, (38) 

where the coefficients are ratios of two expectations, 

7,., = Eu^-JEeU . (39) 

These expectations may be expressed as functions of the auto-covariance 
coefficients, 

tp t = Eu k u k -i , (40) 

of the stationary moving average process { u k } . 

Our derivation begins with the expression of the error at step k, 
e k+ i = x k+ i — atf+i , as the difference between equation (27) for x k+i 
and equation (37) for xf +1 : 

n-l 

e k+ i = u k+ i — 22ynfik-i- (41) 

Squaring equation (41) and taking the expectation we obtain 

Eel +1 =<p - E-A&C-i (42) 

which gives the mean-square error at step k in terms of current filter 
coefficients and past errors. To find the next set of coefficients, Y.-.t+i , 
we express e k+1 -i as in equation (41) and we find the expected product 
of this random variable and u* +2 - Then we divide by the mean-square 
indicated in equation (39) with the result 



7«-i.*+i = <p„/Eel +2 -n , 



T.-.fc+i = *».+i — z_/ li.k-.ji+i + i.k+iEei-i-i 



Ee 2 k+1 -i , 



i = n - 2,n~ 3, ••■ ,0, (43) 
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where the upper limit on the sum is a consequence of the property, 
Eu k+2 e k -i-i = for j ^ n — i — 1. [See equation (31)]. 

The filter coefficients a ik and b ik of equation (26) are related to the 
projection coefficients y,u and the autoregressive coefficients, a,-, of the 
process representation by 

b ik = — 7.-1. k , 
because equations (37) and (38) combine to form 

n-l n 

B*+i = 2 (7.4 — a, + i).T*-,- — Etm.^i-; . (45) 

i=0 i=l 

Our recursive technique for finding the characteristics of the optimal 
nth-order growing memory predictor thus consists of alternately per- 
forming the calculations of equations (42) and (43) and of obtaining 
the filter coefficients at each step by means of equation (45) . 

G.3 Convenience of Filter Coefficients 

Since the time-varying filter output y k converges to the optimal 
unconstrained predictor £ k+1 , one would expect that the time-varying 
coefficients a ik and b ik will converge to constant coefficients a, and b { . 
Since we have excluded processes with zeros on the unit circle, an nth- 
order recursive structure for the optimal predictor is known to exist. 1 
But this is not sufficient. It is also necessary to exclude the possibility 
that the intrinsic order of the process is less than n. Then the coefficients 
of the nth-order recursive equation for the optimal predictor are unique 
and the time-varying coefficients a ik and b, k will in fact converge to 
these constant coefficients. 

6.4 Relation to Kalman Filtering 

In addition to proving convergence of our design approach, we have 
shown for the matched order case that the time-varying filter generated 
by the design procedure is the optimal growing-memory predictor. 
At each instant, k, the 2?i stored data samples contain all the needed 
information about the observed past of the process, x , .Ti , ■ • • , x k . 
It follows that the time-varying filter must be identical to the Kalman 
predictor 3 which is obtained by expressing the process model in state 
equation form. However, the Kalman development is computationally 
less efficient as may be seen by comparing the Ricatti equations with 
the simpler recursions given in Section 6.2. 
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In recent months recursions similar to ours have been published in 
various contexts. They appear in a paper by J. Rissanen and L. Bar- 
bosa 9 as steps in the factorization of the covariance matrix of {u k }, the 
nth-order moving average, and Kailath 10 has indicated that such 
recursions follow from an innovations approach to prediction. Related 
formulas also appear in R. L. Kashyap's 11 derivation of predictor char- 
acteristics in terms of the parameters a t and /? £ of the process represen- 
tation. In our derivation, as in Refs. 9 and 10, the basic data are the 
set of at and the autocovariance function of {u k }. In contrast, the 
new design algorithm presented in Section VII uses only the covariances 
of the process to be predicted, quantities that are often more accessible 
in practice than the process parameters. 

VII. SYNTHESIS TECHNIQUE 

In this section we apply the projecting-filter design approach of 
Section V to obtain a computational algorithm for the general case in 
which the order of the process may differ from the order of the filter. 
The basic idea of the approach is to compute successive sets of weight- 
ing coefficients for an nth-order time-varying recursive filter which 
asymptotically approaches the desired time-invariant projecting filter. 

As discussed in Section V, the time-varying projecting filter of in- 
terest is characterized by the input-output relationship 

y k = P|.T t+1 |3C,| (46) 

where nc A - denotes the subspace spanned by the 2?i variates 

x' k , x' k -i , • • • , x' k - n+ i , y k - x ,y k -2, • • • , y k -n ■ 

Equation (46) uniquely defines y h as the projection of x k+1 into 3C k . 
This projection can be expressed explicitly as a linear combination of 
the 2?? variates; that is, 

2/, = £ a ik xLi + £ b ik y k ^ . (47) 

Let d(3C k ) denote the dimension of the subspace 3C k , i.e., d(3C k ) is 
the minimum number of variates needed to span 5C k . If d(3£ k ) = 2n 
then the 2ft spanning variates are linearly independent and the coef- 
ficient set used in equation (47) is unique. On the other hand if d(3C k ) < 
2n, the 2n spanning variates arc linearly dependent and consequently 
there is an infinite number of possible choices for the coefficient set. 
This situation always occurs in the first 2n — 1 iterations (0 ^ k < 
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2n — 1) and it may occur as well in subsequent iterations. To overcome 
this difficulty, we adopt a consistent procedure for selecting a linearly 
independent subset of the 2ft spanning variates for each k. Variates 
are eliminated by setting appropriate coefficients to zero in equation 
(47). The remaining coefficients are then uniquely determined from the 
covariance matrix of the remaining variates and the cross-covariances 
between the remaining variates and x t+ i . 

The algorithm is initialized with y = a 00 x and all of the other 
coefficients a i0 and b i0 (i ^ 0) set to zero. Then each iteration consists 
of the following steps: (i) solving for the appropriate coefficient values, 
(it) computing the needed covariances for the following iteration, 
(Hi) determining an independent set of variates for the next prediction. 

7.1 Reduced Representation 

The procedure for eliminating dependent variates from the set of 
available data at time k leads to the following expression [equivalent 
to equation (47)] for the kth prediction 

V— 1 Q 

Vk = £ a ik x k -i + J2 b ik y k -i (48) 

i = t'-l 

with p ^ ft, q ^ ft.* The coefficients that do not appear in equation 
(48) are all set to zero in the process of eliminating dependent variates; 
that is, 

a ik = 0, t = p,p + 1, ••• ,n — 1; 

b ik - 0, i = q + 1, q + 2, ••• ,n. 

Note that x k - { rather than x' k _ { appears in equation (48). This is so 
because s£_* = for z > /c so that any set containing this variate is 
necessarily dependent. Hence, in the initial n steps, p ^ k — 1. Section 
7.4 presents the general method by which a set of independent variates 
is determined. 

7.2 The Filter Equations 

With the prediction error denned as e 4+ i = z*+i — Vu > the projecting 
property implies the following orthogonality conditions 

Ee k+i x k -i = 0, i = 0,1, ■■■ ,p - 1; ^ 

Ee k+1 y k -i = 0, i = 0,1, ■■• , q. 



♦Note that p and q depend on k. They will be denoted p(k) and q(k) when 
ambiguity might otherwise arise. 
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By substituting equation (48) for y k into equation (49), we obtain the 
following set of d(3C k ) linear equations in the d(3C fc ) coefficients: 

r,+, = Z a ik r t -i + £ &«»(* - h k - i), 

i-O t-1 

j = 0,1, ■••,?- l; 
u>(fc + 1, /c - j) = 2 a,*w(/c - i, /c - i) + £ M(fc - i, * - i), 

1=0 «-i 

j = 1,2, •■• ,$; (50) 

in which we have adopted the notation: 

r t = Ex k x k -i = r-i , 

to(A;, j) = Ex k ijt , 

v{k, j) = EyuVi . 

The function r,- comprises the given statistical information ot the pre- 
diction problem and w and v must be expressed as functions of r,- and 
previously computed filter coefficients. 

Equations (50) have the following partitioned matrix form 

]\A k 
B, 



T p X 
X' k V u 



-jJh- 



R 



(51) 



with 

T p the p X V autoco variance matrix of \x k \, 

X k the p X q cross-covariance matrix of \x k ] and \y k \, 

V k the p X p autocovariance matrix of \y k ), 

An = [o,oic t fin, , • • • , flp-i.J > 

B k = [b lk , b 2k , • • • , b Qk ]', 

R» = [r, , r 2 , •■■ , r v ]', 

W k = [w(k + 1, k - 1), • • • , w(& + 1, fc - g)]'. 

Note that T p and tf P depend only on the given autocovariance function 
r t - and on p, the number of forward coefficients to be computed. They 
are independent of previously computed coefficients. 

If we perform the multiplication indicated in equation (51) and 
then solve for A k and B k we derive 



B k = [V k - X' k U v X k V[W k - X' k C p ], 
A k - C, - U P X k B k , 



(52) 
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where U v = T~* and C„ = UJip , the column matrix of weights corre- 
sponding to the optimum pth-order nonrecursive predictor. By using 
efficient algorithms developed for the analysis of nonrecursive pre- 
dictors, 12,13 one may successively calculate U , V\ , • • • , U n - X , C , 
Ci , ■ ■ • , C n -i before the start of the synthesis procedure so that at 
the /bth step, only a q X Q matrix inversion [rather than one of order 
(p + <l)] is required. We are assured that the matrix to be inverted 
is nonsingular because we have eliminated dependent variates by 
reducing the number of unknowns from 2n to p + q. Note that A k 
consists of the coefficients of the optimum yjth-order nonrecursive 
predictor modified by U v X k B k which indicates the effect of the feed- 
back section of the predicting filter. 

7.3 Obtaining Successive Covariance Statistics 

The nature of w(k, j) depends on which time index is the greater. 
If j ^ k we observe that the projection property of the jth estimate 
implies that Ex k ej = for k = j — 1, ; — 2, • ■ ■ , j — n. Thus if we 
substitute x i+i — e, + , for ?/, in the definition of w(k, j), we obtain 

w(k, j) = E[(x ril - e i+1 )x k ], _> 

= r i+1 . k , j = k, k + 1, • • • , k + n — 1. 

For j < k, we substitute equation (26) for ?/,• in the definition of w(k, j), 
with the result 

B-l n 

w(k, j) = 2 a M r,_,_,. + 2 &««(*• 3 ~ Oi 
> — o 1 = 1 

; = 0, 1, ••• ,k - 1. (54) 

Equation (54) indicates that \iv(k, 0), w(k, 1), • • • , w{k, k — 1)} is 
the sequence of filter outputs when (r_ t , r_ t+ i ,-••,?•_,} is the sequence 
of inputs. This is an example of the property of linear filters that the 
cross-covariance between input and output is the correlation of the 
filter impulse response with the input autocovariance function. Using 
the initial conditions w(k, j) = for j < 0, we may iteratively apply 
equation (54) in order to compute the required values of iv(k, j) for 
j < k. 

The autocovariance coefficients of \y k \ maj r be determined from the 
orthogonality conditions. With k — n ^ j ^ k, we have Ee k+i yj = 
so that 

v(k, j) = E[(x k+1 - e k+l )yj] = w(k + 1, j), 

j = k — n, • • • , k, (55) 
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and of course v(j, k) = v(k, j). Thus, equations (53), (54), and (55) 
express, in terms of known quantities, the covariance coefficients that 
appear in equation (51). 

7.4 The Number of Independent Varieties 

In Section 7.2 we have assumed that p and q, the number of forward 
coefficients and the number of feedback coefficients to be computed 
at time k are determined in a manner that assures the linear independence 
of the p + q variates that appear in equation (48) and therefore, the 
existence of the inverse matrix of equation (52). In many instances 
p = q = nso that all of the data in the predictor memory are linearly 
independent. On the other hand, there are two conditions under which 
the data are dependent. The first is called an initialization condition 
and this arises in the course of every sjmthesis procedure because the 
predictor begins to operate at k = with zero in all memory elements 
except one. The initialization condition obtains for the first 2n — 2 
iterations of the design procedure during which d(5C k ) ^ k + 1 < 2?i 
because 3C fc C (R* and d(6\ t ) = A- + 1. The other condition under which 
d(3C fc ) < In is called a reduced order condition, which arises when 
certain of the final feedback coefficients and/or final forward coef- 
ficients are zero. A reduced-order condition arises for all processes of 
order less than n. 

7.4.1 Initialization 

In this section we assume that no reduced order condition arises 
during the first 2n - 1 steps of the predictor synthesis. This implies 
that d(3C k ) = k + 1 so that p + q, the number of coefficients determined 
by orthogonality conditions, increases by one at each iteration. At 
k = 0, the predictor estimates x x given x which implies p = 1, q = 0. 
For increasing k, we alternately increase q and p by one so that for 
^ k ^ 2n - 2 

■p = l + %k, q = §ft, k even; ,^. 

p = |(fc+ 1) = q, k odd; 

when no reduced order condition arises. Table I shows the variates 
that appear in equation (48) during the initial design stages of a second- 
order predictor. 

7.4.2 Reduced-Order Condition 

At time k + 1, the dependency of the data in storage can be de- 
duced by observation of the coefficients computed at time k. In this 
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Table I — Steps in Predictor Design 



Time 


Predicted Variate 


Independent Data 


Projection 



1 
2 
3 

h 


x 3 

X A 
Xk+l 


Xq 
Xi 
X2 

x 3 

Xk 


xi i/i 

X 2 1/2 1/1 
Iit_l 1/t_l 1/fc_2 


I/O 

i/i 

Vt 
Vi 



section we show how the values of certain coefficients, in particular 
whether or not they are zero, determine the relationship between 
d(SC k ) and d(5C k+ i), the numbers of linearly independent variates in 
storage at time k and at time k + 1. In the next section we present 
the algorithm for determining the number of forward coefficients and 
the number of feedback coefficients to be computed at each step of 
the design. 

The following theorem states that there is a dependence among the 
variates in storage at time k + 1 if and only if the coefficients deter- 
mined at time k correspond to a filter of order less than n. 

Theorem: With d(3C k ) = 2n, d(3C k +i) = 2n — I if and only if a n - l , k = 
b n , k = 0. Otherwise d(3C k ) = 2w. 

Proof: Assume a„_ 1>4 . = b n , k = 0. Then 

Vk = 2 Ui&k-i + 2 b ik y k -i 

t"-0 «-l 

which shows the linear dependency of the following variates in storage 
at time k + 1: x k , x M -i , ■ ■ ■ , x k -„ +2 , y k , • • • , y k - n+ i • Thus d(5C k+1 ) < 
2n. On the other hand, the 2n — 1 variates: x k+i , x k , • • • , x 4 _ n+2 , 
2/*-i t ' • • > Vk-n are linearly independent. All except x k+x are independent 
because they are in storage at time k and d(3C k ) = 2n. In addition, 
the assumption that {x k \ is nondeterministic implies that x k+1 cannot 
be expressed as a linear combination of the other stored variates be- 
cause each of these is in (R t . It follows that d(5C k+ i) = 2n — 1. 

To prove the converse, assume d(3C k+ i) = 2n — 1. It follows that 
there exists a linearly dependent set of stored data. By the reasoning 
given above this set does not include x k+1 because all of the other stored 
variates are in (R* . However the set does include y k because all of the 
other variates are independent. Hence y k can be represented as a linear 
combination of x k , x*_i , • • ■ , x k - n+2 , y k -i , • • • , y k - n +i ■ But the data 
in storage at time k also includes z 4 _ n+1 and y k - n and the fact that 
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d(3C k ) = 2n implies that the representation of y k is unique. Therefore 
we have the coefficients of x k - n+i and y k - n , a n _ 1>fc = 6„,* = 0. Q.E.D. 
By reasoning similar to that used to prove this theorem we may 
establish the dimensionality of the data in storage at time fc + 1 when 
d(W k ) < 2n. Thus we have the following corollaries which apply for 
all k including the initial steps of the predictor design. 

Corollary 1: With d(3C*) = p + q and p = q < n, d(3C k+l ) = V + Q ~ l 
if and only if a p _ li4 = b Q , k = 0. Otherwise d(3C k+ i) — p + q + 1. 
Corollary 2: With d(ZZ k ) - p + q and n ^ p - q + 1, d(3C k+1 ) = 
p + q if and only if o p _ lffc = 0. Otherwise d(3C 4+ i) = p -\- q + 1. 
Corollary 3: With d(3C») = p + q and p = q - 1 < n, d(3C k+1 ) = 
p + q if and only if b ak = 0. Otherwise d(3C k+ i) = p + q + 1. 

7.4.3 The Number of Computed Coefficients 

On the basis of the theorem and corollaries of Section 7.4.2, we 
establish the procedure shown in Table II for determining the numbers 
of forward and feedback coefficients p(k + 1) and q(k + 1) to be com- 
puted at time k + 1. The table indicates that p(k + 1) and q(k + 1) 
may be determined from p = p(k) and q = q(k) (shown in the left 
column) and from the final two feedback coefficients and the final 

Table II — The Number of Coefficients Computed 



Number of 

Coefficients 

Computed at 

Time A: 


Final Coefficients 
Computed at Time k 


Number of 

Coefficients 

Computed at 

Time k + 1 




bq.k 


bq-l,k 


Op-l.k 


°p-2.* 


P(k + 1) 

n 

n 


q(k + 1) 


1 p = q = n 
2 


«() 




5*0 




n 
n 


3 P =q 
4 






5*0 






5*0 


P 
p - 1 


q - 1 

q 


5 p = q < n 
6 


5*0 




5*0 




p + 1 
P 


Q 
9 + 1 


7 P > q 

8 

9 


5*0 






5*0 





5*0 


P 

P 

p -1 


9 + 1 

q 

q + l 


10 p < q 

11 

12 


5*0 





5*0 


5*0 






p + 1 

P 
P + 1 


p 
q 

q - 1 


13 any p, q 














irre 


gular 
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two forward coefficients (shown in the central four columns) computed 
at time k. If there is no entry for one of the coefficients, the indicated 
relationship between p(k + 1), q(k + 1) and p, q is independent of 
that coefficient. The other symbols indicate that a coefficient must 
necessarily be zero or nonzero for a relationship to be valid. 

If, at time k, p + q = d(3C k ), the variates x k , x k - x , • • • , x h - v +i , 
Vk-\ , • • • , Vk-Q are independent. This condition and the theorem and 
corollaries imply that the set \x k+i , x k , ■ • ■ , x k . p(k+l)+2 , y h , ■ ■ • , 
&/*- fl <*+i>+i) is independent and spans 3C k+1 . Thus lines 1 and 2 of 
Table II follow from the theorem; lines 3 through 6, from the theorem 
and Corollary 1; lines 7 through 9, from Corollary 2; and lines 10 
through 12 from Corollary 3. 

The table accounts for all possible combinations of computed coef- 
ficient values except those in which the last two forward coefficients 
and the last two feedback coefficients are all zero. This situation arises 
during the initial design stages whenever the input process is partially 
decorrclated. The manner in which independent variates are chosen 
for such a process is described in Section 7.4.5. When the irregularity 
arises in the design of predictors for other processes, there is no inde- 
pendent basis of 3Cj.+i that is the union of consecutive members of 
\x k \ beginning with x k+l and consecutive members of \y k ] beginning 
with y k . Thus it is impossible to represent P\y k \ 3C k \ in the concise 
form of equation (48). Nor is it possible in general to determine at all 
times subsequent to k an independent set of stored data solely by 
considering p, q and the previously computed coefficients. All this 
serves to complicate quite substantially the representation of y k , the 
equations which determine the coefficients, and the algorithm for 
determining the numbers of coefficients to be computed after the 
occurrence of the irregular condition indicated on the last line of Table II. 

Rather than add substantially to the size of this paper by presenting 
a general technique for treating this situation, we simply note that 
except for partially decorrclated processes, it has never arisen in our 
experience of designing projecting filters and that in fact it appears to 
represent a pathological case. We have not discovered an example of a 
process for which four projection coefficients are simultaneously zero 
after one or more of their counterparts is nonzero at the previous 
time instant. 

7.4.4 Low Order Processes 

When \x k ] is the response of an mth-order filter to white noise and 
in is no greater than /*, the order of the predictor, the synthesis method 
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leads to the wth-order form of the optimum unconstrained predictor. 
Section 6.1 contains a proof of this statement for m = n and in this 
section we show that if m < «, a reduced-order situation arises and the 
effective order of the predictor does not grow beyond n. 

Let a ik and b ik be the coefficients of the optimal growing-memory 
mth-order predictor, determined in the manner indicated in Section 
6.2. Thus 

•r? +1 = Z a,A_, + E &,-,.if +1 -,- . (57) 

1=0 i=l 

Note that for all k ^ 2wi - 1, y k , the output of the rath-order predictor 
is identical to xf +1 because the design proceeds as for a predictor of 
order m . 

Equation (56) indicates that at step 2??? the initialization procedure 
leads to p = m + 1, q = m and 



//-■, 



= Eai,A.-,T £«.•*-- ( 58 > 



where a' it2m and b' iw2m are determined uniquely by the orthogonality 
conditions. Hence it follows from the optimality of equation (57) 
that a' m , 2m = and that the other coefficients are equal to the ones in 
equation (57) with k = 2m. Line 8 of Table I indicates that p(2m -4- 1) = 
m + 1 and q(2m -4- 1) ■= m and once again we have a' m<2m+1 = and 
the other coefficients equal to those in equation (57) for the optimal 
mth-order predictor. It is clear that for all k ^ 2m this sequence is 
repeated with p(k) = m + 1, cj(k) = m imda' m , k = 0. Hence the algo- 
rithm converges to the unique wth-order form of the unconstrained 
optimal predictor. 

7.4.5 Partially Decorrelated Input Process 

A partially decorrelated process is a nonwhite process for which 
every set of j + 1 (j > 0) adjacent samples is uncorrected . In other 
words, [x k ] is partially decorrelated if for some ; > 0, r t = r 2 = ■ ■ ■ = 
i-j = and r, + 1 7^ 0. For example the error process of an nth-order 
projecting filter is partially decorrelated with j = n. 

Note that with a partially decorrelated input, the initial j generating 
filter outputs (corresponding to optimal nonrecursive predictions) 
are zero. Thus 

Vk = . T * +1 = = a a = bit, for ^ k < j and all i. (59) 

This is a reduced-order situation conforming to line 13 of Table II 
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(if we assume b ok = and a ik = b ik = for i < 0). For this ir- 
regular case we adopt the following initialization procedure as an 
alternative to equation (56). 

(i) All coefficients are for k < j. 

(u) p(j) = j + 1, q{j) = 0. 

(Hi) p(k), q(k) according to Table II for k > j. 

VIII. CONCLUSIONS 

This paper introduces the projecting-filter principle of constrained- 
order recursive prediction and presents one technique of projecting 
filter synthesis. This technique has led to the design of the predictors 
described in Section 1.3 and to several other successful designs for a 
variety of random processes. However, the class of processes for which 
the technique is valid (that is, for which the algorithm converges to 
a time-invariant filter) and indeed the class for which a projecting 
filter of a given order exists have not as yet been determined. These 
questions are the subject of current research. Another important area 
of investigation involves the numerical aspect of the synthesis — the 
study of the sensitivity of this or any other design method to round- 
off in the calculation of coefficients. 

Our studies to date indicate that the projecting filter is valuable in 
that it predicts many processes more accurately than other known 
devices of equal complexity. Our results are readily extended to vector- 
valued processes. Finally, we note that the projecting filter principle 
is applicable to a large class of estimation problems of which prediction 
one unit of time in the future is but a single example. 
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